Bayes-Adaptive POMDPs: A New Perspective on the Explore-Exploit Tradeoff in Partially Observable Domains

نویسندگان

Joelle Pineau

Stéphane Ross

Brahim Chaib-draa

چکیده

Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). Our goal is to extend these ideas to the more general Partially Observable MDP (POMDP) framework, where the state is a hidden variable. This difficult decision-making problem can be formulated cleanly by simply extending the state to include the model parameters themselves. However closed-form solutions are not possible. This paper explores a family of approximations for solving this problem. These approaches are able to trade-off between (1) improving knowledge of the POMDP domain through interaction with the environment, (2) resolving uncertainty about the current state, and (3) choosing actions with high expected reward.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning in POMDPs with Monte Carlo Tree Search

The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation an...

متن کامل

Exploration in POMDPs

In recent work, Bayesian methods for exploration in Markov decision processes (MDPs) and for solving known partially-observable Markov decision processes (POMDPs) have been proposed. In this paper we review the similarities and differences between those two domains and propose methods to deal with them simultaneously. This enables us to attack the Bayes-optimal reinforcement learning problem in...

متن کامل

Bayes - Adaptive POMDPs 1

متن کامل

Sample-based Search Methods for Bayes-adaptive Planning

A fundamental issue for control is acting in the face of uncertainty about the environment. Amongst other things, this induces a trade-off between exploration and exploitation. A model-based Bayesian agent optimizes its return by maintaining a posterior distribution over possible environments, and considering all possible future paths. This optimization is equivalent to solving a Markov Decisio...

متن کامل

Planning in Stochastic Domains: Problem Characteristics and Approximation

This paper is about planning in stochastic domains by means of partially observable Markov decision processes (POMDPs). POMDPs are di cult to solve. This paper considers problems where one, although does not know the true state of the world, has a pretty good idea about it and uses such problem characteristics to transform POMDPs into approximately equivalent ones that are much easier to solve....

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Bayes-Adaptive POMDPs: A New Perspective on the Explore-Exploit Tradeoff in Partially Observable Domains

نویسندگان

چکیده

منابع مشابه

Learning in POMDPs with Monte Carlo Tree Search

Exploration in POMDPs

Bayes - Adaptive POMDPs 1

Sample-based Search Methods for Bayes-adaptive Planning

Planning in Stochastic Domains: Problem Characteristics and Approximation

عنوان ژورنال:

اشتراک گذاری